In this paper, we propose a feature selection method to extract functional structures embedded in multidimensional data. In our approach, we do not approximate functional structures directly. Instead, we focus on the seemingly trivial property that functional structures are geometrically thin in an informative subspace. Using this property, we can exclude irrelevant features to describe functional structures. As a result, we can use conventional identification methods, which use only informative features, to accurately identify functional structures. In this paper, we define Geometrical Thickness (GT) in the Cartesian System Model (CSM), a mathematical model that can manipulate symbolic data. Additionally, we define Total Geometrical Thickness (TGT) which expresses geometrical structures in data. Using TGT, we investigate a new feature selection method and show its capabilities by applying it to two sets of artificial and one set of real data.
Fumitaka KIMURA Shuji NISHIKAWA Tetsushi WAKABAYASHI Yasuji MIYAKE Toshio TSUTSUMIDA
This paper consists of two parts. The first part is devoted to comparative study on handwritten ZIP code numeral recognition using seventeen typical feature vectors and seven statistical classifiers. This part is the counterpart of the sister paper Handwritten Postal Code Recognition by Neural Network - A Comparative Study" in this special issue. In the second part, a procedure for feature synthesis from the original feature vectors is studied. In order to reduce the dimensionality of the synthesized feature vector, the effect of the dimension reduction on classification accuracy is examined. The best synthesized feature vector of size 400 achieves remarkably higher recognition accuracy than any of the original feature vectors in recognition experiment using a large number of numeral samples collected from real postal ZIP codes.